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ABSTRACT 


This paper proposes a new technique for alphabetic 
‘data representation. A reduction in storage space 
of 40 to 45 percent should be easily achieved. 


A general program flow chart for encoding and de- 
coding the data is also provided. A case is pre- 
sented for hardware implementation of this method. 
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I. Introduction 


The IBM System/360 introduced the eight bit byte as the standard unit 
of data representation. The reasons for its selection are noted else- 
where. This media is quite satisfactory for numeric data as two digits 
may be represented in a single byte. Each four bit half byte, which 
has a capacity of sixteen states, carries one decimal digit thereby 
waSting only 3/8 (16-10/16) of the theoretical information power of the 
byte. 


Alphameric data for display purposes is another matter. The current 

line printers print about fifty graphics comprised of the alphabet, num- 
bers, special characters, and blank. Therefore, this type of data 

could be contained in six bits, which will hold 64 different characters. 
This means that the byte is only 1/4 (64/256) utilized, an unsatisfactory 
factor. 


This paper presents a method that permits the use of hexidecimal (4 bit) 
coding to represent fifty-five graphics. Theoretically, reductions of 
50 per cent are possible, however, in practice 40 to 45 per cent are 

to be expected over one character per byte storage. Clearly there 

are profound implications on storage requirements on tape and direct 
access volumes, and on data transfer rates. 


The method is unique in that by using four bit coding the stream pro- 
cessing instructions of System/360 are available to assist encoding 
and decoding the data. Further, the user is able to optimize the method 
for his own data sets. 


II. The Method 


To understand this method, one must not think of data as static, residing 
in storage or on a printed page, but rather as a dynamic stream flowing 
past an observation point. Using hexidecimal coding, we have only 16 
possible codes to represent all the desired graphics. Therefore, itis 
clear that we must re-use some of these 16 symbols several times to 
achieve full data representation. So, we will reserve one character, 

the hexidecimal F, to designate that a new coding structure will be used 


for all the following characters until another hexidecimal F is encountered. 
We will call this hexidecimal F a shift code. 


Shift codes do not signify what the new coding structure is to be so that 
this information must be taken from the hexidecimal character following 
the shift code. We will call this character the table designator code,as 
it designates the new coding table. This means that if, in the stream of 
data, we find a character not available in the current set of 15 (the 16 
possible four bit codes minus the shift code), we must insert a shift code 
and table designator to put us in a set which does contain the desired 
character. 


As our objective is to increase data packing efficiency and since shift 
and table designators do not carry useful information, we must try to re- 
duce their frequency or find a way to make them carry significant informa- 
tion to increase packing efficiency. 


To this end, it is well to study the usage of the characters of the alphabet 
in the construction of ordinary English language text. Figure 1 illustrates ° 
these relative usages. We see that the first thirteen characters of this 
table account for 85 percent of all usage. Therefore, we will call these 
the prime text characters, and the remaining 13, the residue characters, 


If we combine the prime text characters with a blank and one of the 13 
residue characters, we can create thirteen coding tables (using 15 codes 
each), that cover the entire alphabet in one table or another, each table 
containing the prime text characters, Further, we can make the hexi- 
decimal code for the residue character contained in each table identical 

to the table designator. This will allow the table designator, as it appears 
in the data stream following a shift code, to also designate the actual 
graphic desired, thereby eliminating the loss of the table designator as a 
vehicle for information. Figure 2 is an illustration of this kind of table. 
Note that in this figure, 13 separate coding structures are indicated for 
the 15 available hexidecimal codes. The coding tables are arrayed 
vertically to correspond to the hexidecimal code indicated on the left axis. 
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The Method (continued) 


II. The Method (continued) ia 
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in table zero, there will be a shift code followed by a Lda Sie teat 16 Wh ata esdigc Bee oe 
table desiaqnator to shift us to a table containing the . 
desired residue character. Let us tiy @ sample Gaia stream using numbers and special cmaracters 


To illustrate decoding, let us take the hexidecimal stream D3BB3P465F5. bso, NPT NINES Pasas 
If we begin at table zero, we find that D decades to F, 3 to alphabctic OG, . 


BtoL, BtoL, 3 to W. We then encounter a shift code which says that ee hea arr ne ere 

the next hexidecimal character is a table designator for a new coding 

structure, as well as the hexicecimal code for the next aiphabetic poo. Bp OR Bie ey a Oe Be ee 
character, Therefore, we shift to table 4 where 4 decodes to W, 6 to I, oe ME ina, octee e _b MO _b 
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Let us encode the message JOE. The J] .s not to be found in table zero, 

so we insert a shift code followed by tre table designator A, which is GOR RB 
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purpose, three additional tables may be provided using the three unused bieies 
table designator codes. These table designators are special, in that 

they do not represent the next_data character in the stream, so that it eae ae a: 
takes a whole byte to shift into one of these tables (shift code plus r 

table designator). This is not a severe restriction, as numerics will 

usually occur @s-gqrouns, and botinnumer. ce and special characters are (F)7 Z Be)" i ee eG a(f)2 SOG ee Se ee a ae 
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E is compesed of the prime text characters and a blank. This is providec 
to permit an exit from Table D or F where tne first f é 
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Il. The Method (continued) 


6EDE0AE4B8E 


0 907E3DE190ED3BB 46 XFS 
IFIEDbASbE 


6190 

ITHERbDOF bTHEDFOLLO WIN G 
This message is 82 bytes long in EBCDIC and 91 half bytes long in 
hexidecimal, a saving of 47 percent. 


it would also be well to point out that a large volume of numeric data 
(rarticularly small fields) could be represented more efficiently with 
“is method than with packed decimal, because the sign position would 
.ut be required for each byte. 


-nis method provides a method of representing 54 graphics with a six- 
*een bit structure. This will handle most standard printer arrangements. 
However, some applications, such as text printing, may require both 
upper and lower case alphabet, and additional special characters. This 
could be provided if we were to let one of the special characters in table 
F tell us to shift to an entirely new set of coding tables, representing 
lower case alphabet and additional special characters (they could be the 
same as the original set if desired). 


III. Implementation via Software 


The stream of coded half bytes will not have a predictable length due 

te the variable packing percentage inherent in this system. Therefore, 
it is suggested that the first byte of each coded text be the length of the 
coded field in binary representation. This format will make it possible 
to use the length directly in instructions, or via an execute instruction 
with a general register. 


To facilitate encoding and decoding data, the format of the coded bytes 
should be stored as follows. The first half of the coded message should 
be stored in the numeric portion of consecutive byte locations. The re- 
maining half of the message could then be stored in the zone portion of 
the same bytes. The format is shown below with consecutive half bytes 
numbered starting at l. 


Bytes 
Zone 11] 12113114 |15] 16] 17] 18} 19] 20 
nN 
Numeric |? 1] 2] 3] 4] 5] 6] 7] 8] 9] 10 


= 20 in binary 


Note that the length specified is the length of the stream of half bytes 
not including the length byte. 


While these encode and decode routines have not been coded, a general 
flow chart for each is included as figures 4 and 5. These routines will 
make extensive use of the four bit data handling ability of System/360 
to stack and unstack the coded stream with the move:zones, move 
numeric and move with offset instructions. The stream may be searched 
for shift codes or characters not present in the current table with the 
translate and test instruction, and the actual translation performed by a 
translate instruction. Of course, the speeds of these routines are not 
determinate at this time, but the time necessary for translation might be 
more than compensated for by a reduction in I/O transfer time, particular- 
ly if inactive records were not decoded at all. 


IV. Implementation via Hardware 


The read only storage and microprogramming ability of the System/360 
would suggest that instructions other than those in the current set could 
be implemented in the hardware. If this could be done in the case of an 
encode alpha and decode alpha instruction, the time necessary for the 
translation could be significantly reduced, as well as eliminating the sub- 
routine storage requirement. 


By using the suggested instruction formats, the user would supply the 
encode and decode tables for the instructions. This provides the user 
with the ability to analyze the frequency distribution of the characters 

of each data set, and thereby optimize the packing. Alss, using separate 
tables could provide a measure of security for confidential data sets or 

in data transmission. 


It would seem advantageous to change the stacking format for hardware 
implementation to an over and under arrangement as shown below: 


Bytes 
zone 1 21416148 10}124 14] 16 418 j 20 
Numeric Ga 1 3 5 7 9 }114}13} 15417419 


Suggested formats for these instructions are shown below: 


Encode Alpha Instruction 


Rl R2] Bl | Di | B2 | b2 


SS Format 


= 


Operand 1 is the field to receive the packed information (length byte 
position). 


Operand 2 is the field to be packed (length byte position). 


Rl is the register containing the address of the user supplied 256 byte 
encoding table. 


Length of the data to be coded is taken from the first byte of operand 2. 
Output length is developed and inserted in the first byte of operand 1. 


R2 not used. 


The area to receive the packed data should be as long as the area con- 


taining the source data, as it is possible that no packing could take place. 


Decode Alpha Instruction 


[oP | RL R2_|_B1 | Di [Bo | D2 


Operand 1 is the area to receive the unpacked information (length byte 
position). 


Operand 2 is the data to be unpacked (length byte position). 


Rl, the general register which contains the address of the user supplied 
256 byte decode table, may be the same as the encode table. 


Length of the coded data is supplied by the first byte of Operand 2. The 
length of the decoded alphameric information is placed in the first byte 
of Operand 1. 


R2, not used. 


The area to receive the unpacked data should be twice as long as the 
coded data area, as it is theoretically possible for this much expansion 
to take place. | 


Timing 


If these instructions could be made to operate 1/2 as fast as a trans- 
late and test instruction (102 + 16N microseconds on a Model 30, where 
N is the number of bytes processed) then using 30KB tapes, we would 
have only to save four bytes of information to break even, and we would 
gain 28.4 microseconds for each additional byte saved. 


44.4N = 102 + 16N 
28.4N = 102 
SZ 4 bytes to break even 


Of course, each device and CPU must be evaluated for its impact on 

thruput speed under this assumption. We must also remember that in 
many jobs, not every record is active and need not be decoded, pro- 
viding a pure thruput bonus. Also, records that are decoded but not 

altered need not be encoded to be returned to the data set. 


V. Application Areas Impacted 


Below are suggested some of the more obvious application areas that 
could be impacted by reduction of 40 to 45% in alpha storage require- 
ment. The readers will probably think of many others. 


l. Text Storage 
a. Administrative Terminal Systems 
b. Computer Aided Instruction 
2. Name and Address Files 
3. The Insurance Alpha Index (MIB) File 
4. Historical Data Storage 
5. Information Retrieval Systems 
6. Data Transmission (to increase effective line speed) 


VI. Conclusions and Recommendations 


IBM should seriously consider implementing an alphabetic encode and 
decode instruction on the System/360. This could give us a strong 
competitive edge in thruput and direct access storage capacity. It 
could also open some new application areas that are currently marginal 
because of the cost of storage of alphabetic text. 


10 


Usage of the Alphabet in English Language Text 


ADAM APrPOHNM 


eH 
Hae 


NOTIKSA SW OQSseKn Ve GIO YD 


Source: See Reference I 


Usage per 1000 
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Cumulative Usage 


per 1000 


Figure | 


.221 


303 


381 
»454 


522 
589 
654 
713 
757 
793 
822 
850 
878 
904 
926 
941 
956 
970 
983 
993 
997 
998 
“999 
1000 
1001 


HEXIDECIMAL CODE 


tro Qe PO ON OD fF wp BE 


oT FS OQ F YU DT WH DH BS wp Oo HH Italo 


1 oO Be tes 


6 
E 
T 
O 
A 
N 


TABLE DESIGNATOR CODE 


7 8 9 A B 
EEEEE 
TT T T ¢ 
O60 6 6 © 
A AA A A 
N N NNN 


w wi[o|z i za 1 


ie 


Figure 2 


GC RIN TO fF be wo Se ee & & 6 4 ea 


HEXIDECIMAL CODE 
yt Hy a tr oO ON Ow £FwWpDp B 
T Pa HH eb te nm DH B@B Bo 
oO ya ef Go Fe A DW HH 2B > 
ST Hy AH Yue tan DH B2 & 


. PRINTED IN U.S.A, 
TABLE DESIGNATOR TIVE Flowcharting Worksheet aeons 


Programmer: fies 


Chart ID: Chart Name: 


soe PROGTGM-ANG tse ei DGtetu so. >... Pages 


Encode Routine : 
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q— Fold under at dotted line. 
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